TCOLS v1.20 - a table column filter

Revised 5-Apr-96. Copyright (c) 1996 by Rune Berg. TextTools Freeware.


Introduction
Usage
Options
Input Data: Fields And Separators
Simple Expressions
Expressions With Arithmetic
Expressions With Functions
Errors During Processing
More Examples
Expression Syntax
The Function Library
Limitations


INTRODUCTION

tcols is a filter for projecting and transforming data columns.

tcols runs from the command line or from batch files.

Input and output data are plain ASCII text lines, each line being treated as (by default, but see -i option) whitespace-separated fields. Files are typically used for input and output data.

For example, consider a text file "data", containing the following table (3 columns, 4 lines):

	john  45   tennis
	al    31   squash
	tom   25   beer
	paul  38   women

The command:

	tcols from data $3 $2

writes the third and second columns (separated by a tab) to the screen:

	tennis	45
	squash	31
	beer	25
	women	38

Here's another example, using the same file "data". The command:

	tcols from data to results $1 /loves/ $3.upp

writes the following to the file "results":

	john     loves    TENNIS
	al       loves    SQUASH
	tom      loves    BEER
	paul     loves    WOMEN

The last example shows the use of functions. tcols has functions for string manipulation, formatting, decimal/hex/octal conversions, and a few other things.

The above examples show only a few of tcols's capabilities, so read the next sections for a full description.

Note: All usage examples in this document are for tcols running on MS-DOS. Running tcols on a Unix shell requires quoting appropriate for the particular shell.


USAGE

tcols [log logfile] [options] [from infile] [to outfile] expr [...]

Where:

log logfile
Append error messages, summary, etc. to logfile instead of writing them to standard error (the screen).
If logfile does not exist, tcols will create it.
Useful if tcols is invoked as part of a batch job and you want a log-file of any messages etc. produced. Other TextTools programs have a similar log-file facility. (On systems where standard error can be redirected, you'll probably not need this facility.)
options
Flags to control various aspects of tcols's behaviour. See Options section.
from infile
Make tcols read from infile. If not given, tcols reads from standard input.
to outfile
Make tcols write to outfile. If not given, tcols writes to standard output.
expr
An expression, e.g. $3.
...
Further expressions.

[] denotes an optional item.

Upper/lower case for the 'log', 'from', and 'to' keywords is not significant. Also, these keywords should not be used as file names.


OPTIONS

-w
Don't abort on a processing error, just skip bad line and write a warning to standard error (or logfile, if used). See Errors During Processing section for more details.
-v
Print banner with version number to standard error (or logfile, if used), then exit.
-he
Print summary of expression usage to standard output, then exit.
-hf
Print summary of all functions to standard output, then exit.
-hf name
Print summary of named function to standard output, then exit.
-r
Print one-line summary to standard error (or logfile, if used), when processing is completed.
This option will have no effect if processing is aborted due to an error.
-iC
Separate input fields by character C (except \).
Use \t to form a tab.
-oS
Separate output fields by string S, instead of the default tab character.
Use \t to form a tab, \\ to form a backslash.
-o recognizes no other escaped characters.
E.g. "-o " separates output fields by three spaces.
-o: separates output fields by a colon.
-o\t\t separates output fields by two tabs.
-o\\ separates output fields by a backslash.
-o\z separates output fields by a backslash and a 'z'.
-o\ separates output fields by a backslash.
-o prints output fields right next to each other.


INPUT DATA : FIELDS AND SEPARATORS

The input data to tcols is ordinary ASCII text lines.
tcols sees each line as consisting of zero or more fields (denoted $1, $2, ...).


Whitespace-separated fields (default)

tcols sees each field as separated by at least one tab or blank, e.g.:

	john 37  butcher  (end-of-line)
	<--> <>  <----->
	$1   $2  $3

If an input line has no fields (i.e., consists of whitespace only), then tcols will write an empty line to the output, without evaluating the expression(s).

If you want a field to contain whitespace, then the field must be surrounded by single quotes, e.g. 'hey you', or by double quotes, e.g. "hey you".

If you want a single quote inside a singly quoted field, precede it by a backslash, e.g. 'It\'s allright'.

If you want a double quote inside a doubly quoted field, precede it by a backslash, e.g. "She said \"yes\"".

If you want a backslash inside a singly/doubly quoted field, precede it by another backslash, e.g. "a backslash: \\".

If you want a single quote inside a doubly quoted field, no special care is needed, e.g. "It's allright".

If you want a double quote inside a singly quoted field, no special care is needed, e.g. 'She said "yes"'

'' and "" are valid fields.

If tcols finds an unmatched quote on an input line, then tcols reads that quote and the rest of the line as one field. For example:

	12 5654 'I feel good     8899     (newline)

<> <--> <------------------------> $1 $2 $3

When tcols reads a quoted field from the input, tcols considers the surrounding quotes part of the field.


Character-separated fields

If you use the -iC option, tcols uses the character C to separate the fields on an input line. In this discussion, we'll consider comma-separated input data.

As an example, this is how tcols -i, would see the following input line:

	Al,   42, shoe salesman,married,2, Dodge  (newline)
	<> <---> <------------> <-----> - <------>
	$1 $2    $3             $4      $5 $6

Any text between the start of the line and the first comma, between two commas, or between the last comma and the end of the line, constistutes a field.

Two commas right next to each other constitute an empty field; this is perfectly legal.

If an input line consists of whitespace only, then tcols will write an empty line to the output, without evaluating the expression(s). Otherwise, whitespace has no special significance when you're using the -i option.

Quotes have no special significance when you're using the -i option.

If you want a comma inside a field, precede it by a backslash: \,
If you want a backslash inside a field, precede it by a backslash: \\

Make sure your input data does not have unwanted spaces at the end of lines!


Fixed-length fields

Use $r (raw line) and the subs function. For example, the command:

	tcols -o, from alpha.txt $r.subs(1,8) $r.subs(9,12) $r.subs(13,18)

where the file "alpha.txt" contains the line:

	abcdefghijklmnopqrstuvwxyz

yields the following output:

	abcdefgh,jklm,opqrst


SIMPLE EXPRESSIONS

Expressions specify how tcols should map input data to output data. tcols applies the expressions to each input line in turn, producing a corresponding output line. The only exception is empty input lines (lines that contain only whitespace); they result in an empty line on the output, without being evaluated.

Here are some simple expressions:

	$3	: Yields the third field on the input line.

	$1..4	: Yields the first ... fourth field on the input line.

	$l	: Yields the last field on the input line.

	$2..l	: Yields the second ... last field on the input line.

	$c	: Yields the count of fields on the input line.

	$r	: Yields the entire input line, whitespace and all.

	532	: Yields the literal integer 532.

	/hey/	: Yields the literal string hey.

In general, $N yields the N'th field, for integer N >= 1.

In general, $M..N yields $M'th ... $N'th field, for integers M,N >= 1, M >= N.

In general, $M..l yields the M'th ... last field, for M >= 1.

$c is useful for 'jaggy' tables: tables with an unknown number of fields on each line.

$r is useful when input lines cannot be treated as (e.g.) whitespace separated fields.

In literal strings, use \/ for /, and \\ for \.
Note that other backslash character sequences are not transformed; e.g. \j remains \j.

An integer within //, e.g. /876/, is still regarded as a literal integer.

Example: printing 3rd and 5th fields separated by just a colon:

	tcols -o: from myfile $3 $5

Example: swapping the 3rd and 8th column in an 12 column table:

	tcols from myfile $1..2 $8 $4..7 $3 $9..12

Syntax errors in expressions will cause tcols to exit with an appropriate error message, before any processing.

An expression should not contain spaces, except in string literals (in which case the whole expression must be surrounded by double quotes, e.g.: "/hi there/".)

The Expression Syntax section describes the exact grammar.


EXPRESSIONS WITH ARITHMETIC

This sections describes how to use tcols's arithmetic operators: + - * / %.

Here are some expressions that show their usage:

	$1+$2	: Yields the sum of the first and the second fields

	100-$6	: Yields the difference between 100 and the sixth field.

	$1*$2	: Yields the product of the first and second fields.

	$3/2	: Yields the third field divided by 2.

	$1%%10	: Yields the remainder of (the first field divided by 10). (Note 1)

	-$2	: Yields the second field negated. (Note 2)

Note 1: The extra % is needed to prevent the MS-DOS shell from treating %10 as the 10th command line argument.

Note 2: If you invoke tcols to use standard input/output, and the first expression starts with a '-', then put that first expression in brackets, e.g. (-$2), so tcols doesn't think it's a command line option.

The arithmetic operators work on integers, or on expressions that evaluate to integers.

Shortcuts are possible. For example, the expression:

	($2,$3,$1)*10

applied to the input line:

	1 2 3

yields the following output:

	20 30 10

Note that the right hand side of + - * / and % must evaluate to exactly one number.

Unary - (minus) has the highest precedence, so the following are equivalent:

	-$2-4  (-$2)-4

* / and % have equal, and next highest precedence. They're evaluated left to right, so the following are equivalent:

	$2*$4/$2  ($2*$4)/$2

+ and binary - have the lowest precedence, and are evaluated left to right, so the following are equivalent:

	$1-$2+$5*7  ($1-$2)+($5*7)

Parenthesis, ( ), can be used to override precedence:

	($1+$2)*100


EXPRESSIONS WITH FUNCTIONS

This section describes how to form expressions with function calls.

A function call has one the forms:

	expression.functionname
	expression.functionname(arguments)

Here are some example function calls:

	$1.suqt		: Yields first field with surrounding single quotes removed.

	$2.clip(3,5)	: Yields second field with 3 leftmost and 5 rightmost
			  characters clipped off.

	$1..5.rjf(8)	: Yields first .. fifth fields right justified in fields (sorry!)
			  of 8 spaces.

As a shortcut, expressions can be grouped with ( ) and then fed to a function:

	($1,$3,$4,$8).suqt   : Yields first, third, fourth, and 
	                       eighth fields without surrounding
                               single quotes ('').

This saves you from writing:

	$1.suqt $3.suqt $4.suqt $8.suqt

Some functions are only meaningful when applied to several expressions:

	($1,$4,$7).cat	: Yields the concatenation of the first, fourth, 
	                  and seventh fields.

Function calls can be chained:

	$r.subs(1,10).upp          : Yields first 10 characters in upper case.

	$1..l.sum.rjf(10).padl(0)  : Yields sum of all fields, right justified in
	                             field of 10 characters padded with 0's.

Any expression can be used as a function argument:

	$3.rig($1.len)   : Yields the N rightmost characters of the third field,
	                   where N is the length of the first field.

If a function is given the wrong number of arguments, or the wrong type of arguments, tcols will print error message to standard error (or logfile, if used) and exit. However, if you use the -w command line option, tcols will skip the offending input line, print a warning to standard error (or logfile, if used), and continue processing the next input line; see the Errors During Processing section.

The Function Library section describes all functions and their required arguments.


ERRORS DURING PROCESSING

A processing error occurs if the contents of an input line prevent tcols from evaluating your expressions.

tcols's default error action is to print a relevant error message and exit.

However, if you set the -w command line option, tcols will skip the bad input line and continue processing the next input line. tcols prints a warning anyway.

tcols prints error messages and warnings to standard error (or the logfile, if used).

Here are some typical processing errors:

The input line does not contain enough fields to the satisfy greatest field selection, e.g. $5 for a line with just 4 fields.

A function is given arguments of the wrong type, e.g. $3.clip(1,/abc/). (clip requires two integer arguments.)

A function is given an out-of-range initial special argument, e.g. $1.rjf(1000). (The argument to rjf must be in the range 1..255.)

A function is given too few arguments, e.g. $3.clip(2).

tcols is rather strict about input data. For example, the sum function will only work on integer arguments, even though I could have made it ignore non-integer arguments. My reasoning is: tcols will often be used for processing hand-typed data. Typists sometimes hit the wrong keys. If tcols were lax about bad input data, it might quietly produce bad output data.


MORE EXAMPLES

This section gives more examples of complete tcols commands.

These examples start with the file "books" which contains:

	Poe       'Edgar Allen'     "Selected Stories"   1879  horror
	Thompson  Jim            "The Killer Inside Me" 1950  crime
	Lem        Stanislaw      "Return From the Stars"  1961 sf
	Crumley    James              "Dancing Bear" 1983  crime
	'Le Carre'  John            "Smiley's People" 1972 spy

Now, this file looks a bit messy. You want to reformat it to look cleaner, with first names and surnames together, no single quotes around the names, and no year of publication. The command:

	tcols -o from books to books2 "($1.suqt,/, /,$2.suqt).cat.ljf(20)" $3.ljf(25) $5

prints the following to "books2":

	Poe, Edgar Allen    "Selected Stories"       horror
	Thompson, Jim       "The Killer Inside Me"   crime
	Lem, Stanislaw      "Return From the Stars"  sf
	Crumley, James      "Dancing Bear"           crime
	Le Carre, John      "Smiley's People"        spy

Allright. To ease future processing, you want your book list on a field-oriented format. The command:

	tcols -o from books2 to books3 $r.subs(1,16).trt.dqt.ljf(20) $r.subs(21,43).trt.ljf(25) $l

prints the following to "books3":

	"Poe, Edgar Allen"  "Selected Stories"       horror
	"Thompson, Jim"     "The Killer Inside Me"   crime
	"Lem, Stanislaw"    "Return From the Stars"  sf
	"Crumley, James"    "Dancing Bear"           crime
	"Le Carre, John"    "Smiley's People"        spy

Now, you can use another TextTools program, trows, to print all your crime books. The command:

	trows from books3 $3=/crime/

prints to the screen:

	"Thompson, Jim"     "The Killer Inside Me"   crime
	"Crumley, James"    "Dancing Bear"           crime

Or, you can sort your books on author name, using yet another TextTool program: tsort. The command:

	tsort from books3 $1

prints to the screen:

	"Crumley, James"    "Dancing Bear"           crime
	"Le Carre, John"    "Smiley's People"        spy
	"Lem, Stanislaw"    "Return From the Stars"  sf
	"Poe, Edgar Allen"  "Selected Stories"       horror
	"Thompson, Jim"     "The Killer Inside Me"   crime


EXPRESSION SYNTAX

	expr   ::=  list

	list   ::=  arit,list
	        |   arit

	arit   ::=  arit+term
		|   arit-term
		|   term

	term   ::=  term*neg
		|   term/neg
		|   neg

	neg    ::=  -neg
		|   call

	call   ::=  call.funcname(list)
		|   call.funcname
		|   simple

	simple ::=  $M           ; M an integer >= 1
		|   $M..N        ; M,N integers >= 1, M <= N
		|   $M..l        ; M an integer >= 1
		|   $l
		|   $c
		|   $r
		|   number
		|   /string/
		|   (list)

	number ::=  one or more digits (0-9)

	string ::=  one or more printable characters, but use
		    \/ for forward-slash, \\ for backslash


THE FUNCTION LIBRARY

Formatting - Number base conversion - Mathematical - Miscallenous

This section describes all tcols's functions.

E, E1, etc., in this discussion denotes expressions, as far as syntax is concerned, and the result of evaluating expressions as far as evaluation is concerned.


Formatting functions

sqt - suqt - dqt - duqt - upp - low- resc - desc - trl - trt - tr - prf - rjf - ljf - rig - app - pre - rev - clip - subs - padl - padt - cat


sqt - single quote

E.sqt yields E surrounded by single quotes (').

For example, sqt applied to:

	hey   yields: 'hey'
	'hey  yields: 'hey'
	hey'  yields: 'hey'
	'hey' yields: 'hey'
	'     yields: ''
	''    yields: ''
	hey\' yields: 'hey\''

sqt applied to the empty string yields: ''


dqt - double quote

dqt works exactly like sqt, but handles double quotes (").


suqt - single unquote

E.suqt yields E without surrounding single quotes (').

For example, suqt applied to:

	'hey'  yields: hey
	'hey   yields: hey
	hey'   yields: hey
	''     yields: the empty string
	'      yields: the empty string
	hey\'  yields: hey\'


duqt - double unquote

duqt works exactly like suqt, but handles double quotes (").


upp - upper case

E.upp yields E with all letters in upper case.

upp does not touch non-letters.


low - lower case

E.low yields E with all letters in lower case.

low does not touch non-letters.


resc - re-escape

E.resc yields E with every:
	'        changed to  \'
	"        changed to  \"
	\        changed to  \\
	tab      changed to  \t
	newline  changed to  \n

For example, resc applied to:

	'ok'  yields:  \'ok\'
	a"b'  yields:  a\"b\'
	kh\k  yields:  kh\\k
	\'\"  yields:  \\\'\\\"

(Newlines can only occur as the result of desc applied to a string that contains \n)


desc - de-escape

E.desc yields E with every:
	\'  changed to  '
	\"  changed to  "                
	\\  changed to  \                
	\t  changed to  tab                
	\n  changed to  newline

desc changes every \xHH (where HH is exactly two hexadecimal digits) to the corresponding ASCII character.

desc changes every \O (where O is one, two, or three octal digits) to the corresponding ASCII character.

desc makes no other changes. For example, \z is not changed to z.


trl - trim leading blanks

E.trl yields E without leading blanks.

For example:

	/  aaa/.trl.sqt   yields: 'aaa'


trt - trim trailing blanks

E.trt yields E without trailing blanks.

For example:

	/aaa  /.trt.sqt  yields: 'aaa'


tr - trim leading and trailing blanks

E.tr yields E without leading or trailing blanks.

For example:

	/ aa a  /.trt.sqt  yields: 'aa a'


prf - print formatted (minimal printf)

(E1,E2,...).prf(f) yields (the format string) f with every n'th #s replaced by En, every ## replaced by #, every \t replaced by a real tab, and every \n replaced by a real newline.

For example:

	(/a/,/b/,/c/).prf(/#s---#s---#s/)  yields: a---b---c

There must be enough E's for the #s's. Extra E's are ignored.


rjf - right justified field

E.rjf(w) yields E right justified in a field of at least w spaces.

w must be an integer in the range 1 .. 255.

For example:

	45.rjf(7).sqt  yields:  '     45'
	45.rjf(2).sqt  yields:  '45'	
	45.rjf(1).sqt  yields:  '45'


ljf - right justified field

E.ljf(w) yields E left justified in a field of at least w spaces.

w must be an integer in the range 1 .. 255.

For example:

	45.ljf(7).sqt  yields:  '45     '
	45.ljf(2).sqt  yields:  '45'
	45.ljf(1).sqt  yields:  '45'


rig - right substring

E.rig(i) yields the i last characters of E.

i must be an integer greater than or equal to 0.

If E has less than i characters, E.rig(i) yields E.


app - append

(E1,E2,...).app(s) yields s appended to E1, E2, ...

Useful for appending the same string to several expressions.

For example:

	(4,5,6).app(/.00/)   yields: 4.00 5.00 6.00


pre - prepend

(E1,E2,...).pre(s) yields s prepended to E1, E2, ...

Useful for prepending the same string to several expressions.

For example:

	(2,3,4).pre(/#/)   yields: #2 #3 #4


rev - reverse

E.rev yields E reversed.

For example:

	/istanbul/.rev   yields: lubnatsi

Note that rev changes \' to '\, etc.


clip - clip off

E.clip(i,j) yields E with the i leftmost and j rightmost characters clipped off.

i and j must be integers greater than or equal to 0.

If the length of E is less than or equal to i + j, then E.clip(i,j) yields the empty string.

For example:

	/abcdefg/.clip(2,3)   yields: cd


subs - substring

E.subs(i,j) yields the i'th ... j'th characters of E.

i and j must be integers greater than or equal to 1.
j must be greater than or equal to i.

If i is greater than the length of E, E.subs(i,j) yields the empty string.
If j is greater than the length of E, E.subs(i,j) yields characters i .. length-of-E of E.

For example:

	/abcdefgh/.subs(3,6)   yields: cdef


padl - pad leading blanks

E.padl(s) yields E with leading blanks replaced by the first character of s.

s must be exactly one character long.

For example:

	/  55/.padl(/0/)   yields: 0055


padt - pad trailing blanks

E.padt(s) yields E with trailing blanks replaced by the first character of s.

s must be exactly one character long.

For example:

	/ok   /.padt(/./)   yields: ok...


cat - concatenate

(E1,E2,...).cat yields the concatenation of E1, E2,...

For example:

	($2,$3,$1).cat 

applied to the input line:

	56 john zap

yields:

	johnzap56

Number base conversion functions

d2h - h2d - d2o - o2d


d2h - convert decimal to hexadecimal

E.d2h yields E in hexadecimal form.

E must be an integer in decimal form.

For example:

	256.d2h   yields: 100

If E is negative, the number of hexadecimal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)


h2d - convert hexadecimal to decimal

E.h2d yields E in decimal form, possibly preceeded by a minus sign.

E must contain only hexadecimal digits (0..9 a..f A..F).


d2o - convert decimal to octal

E.d2h yields E in octal form.

E must be an integer in decimal form.

If E is negative, the number of octal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)


o2d - convert octal to decimal

E.o2d yields E in decimal form, possibly preceeded by a minus sign.

E must contain only octal digits (0..7).


Mathematical functions

abs - sum


abs - absolute value

E.abs yields the absolute value of E.

E must be an integer.


sum - add up

(E1,E2,...).add yields: E1+E2+..

E1, E2, ... must all be integers.


Miscallenous functions

len - if - ifel - amax - amin - nmax - nmin - turn - rng - ln


len - length

E.len yields the number of characters in E.

For example:

	/mama/.len   yields: 4


if - if then

E.if(f,g) yields: g if E is equal to f; E if E is not equal to f.

If E and f are both integers, they are compared numerically; otherwise they are compared ASCII-wise.

For example:

	$1.if(20,/TWENTY/) 

applied to the input lines:

	20
	67	
	4
	0020

yields the following output lines:

	TWENTY
	67
	4
	TWENTY


ifel - if then else

E.ifel(f,g,h) yields: g if E is equal to f; h if E is not equal to f.

If E and f are both integers, they are compared numerically, otherwise they are compared ASCII-wise.

For example:

	$1.ifel(20,/TWENTY/,/other/)

applied to the input lines:

	20
	67
	4
	+0020

yields the following output lines:

	TWENTY
	other
	other
	TWENTY


amax - ASCII-wise greatest string

(E1,E2,...).amax yields the greatest of E1, E2, ... when compared as ASCII strings.

For example:

	($1,$2,$3).amax

applied to the input line:

	lemonade gin port 

yields:

	port


amin - ASCII-wise smallest string

(E1,E2,...).amin yields the smallest of E1, E2, ... when compared as ASCII strings.


nmax - greatest number

(E1,E2,...).nmax yields the numerically greatest of E1, E2, ..., which must all be integers.


nmin - smallest number

(E1,E2,...).nmin yields the numerically smallest of E1, E2, ..., which must all be integers.


turn - turn

(E1,E2,...).turn yields ... E2 E1

For example:

	$1..l.turn 

applied to the input line:

	56 4 11 899 66

yields:

	66 899 11 4 56


rng - range

(E1,E2,...).rng(i,j) yields: Ei ... Ej

i and j must be integers greater than or equal to 1.
i must be within the count of E1,E2,...
j must be greater than or equal to i.

For example:

	$1..l.rng(2,4)

applied to the input line:

	56 4 11 899 66

yields:

	4 11 899   


ln - append "\n" string

E.ln appends a string containing just a newline character to E.

For example, the command:

	tcols -o, from myfile $1 $2.nl $3 $4 

applied to the file "myfile" containing:

	this is line 1
	this is line 2

prints the following to the screen:

	this,is
	line,1
	this,is
	line,2


LIMITATIONS

This section describes tcols's limitations. Normally these limitations won't bother you, but anyway, here they are:

The maximum length of an input line is 255 characters, not counting newline. tcols will exit (with an appropriate error message) on reading an input line that is too long.

The maximum length of the result of an expression, or part of an expression, is 255 characters. If this limit is exceeded, tcols treats this as a processing error.

The range of integers depends on the compiler and CPU used, but you can assume at least -2147483647 ... 2147483647. The C type 'long int' is used for all things numerical. tcols does not detect numerical overflows and underflows, and tcols's behaviour is undefined in such cases.

The maximum total length of literal strings in the expressions is 300 characters. Note that every literal string counts one extra (unseen) character. tcols will exit if this limit is exceeded, which isn't likely.

tcols has internal tables for representing expressions and results of evaluating expressions. These tables are of fixed sizes and may become full, if you use very many/complex expressions. If so, tcols will exit, with an error message. Remedy: run tcols in several passes, using fewer/simpler expressions in each pass.

tcols will print an error message to standard error (or logfile, if used), if any of the above error situations occurs.


End of document